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Abstract 

We examine the capacity of beamforming over a single-user, multi-antenna link taking into account 
the overhead due to channel estimation and limited feedback of channel state information. Multi-input 
single-output (MISO) and multi-input multi-output (MIMO) channels are considered subject to block 
Rayleigh fading. Each coherence block contains L symbols, and is spanned by T training symbols, B 
feedback bits, and the data symbols. The training symbols are used to obtain a Minimum Mean Squared 
Error estimate of the channel matrix. Given this estimate, the receiver selects a transmit beamforming 
vector from a codebook containing 2^ i.i.d. random vectors, and sends the corresponding B bits back 
to the transmitter. We derive bounds on the beamforming capacity for MISO and MIMO channels and 
characterize the optimal (rate-maximizing) training and feedback overhead (T and B) as L and the 
number of transmit antennas Nt both become large. The optimal Nt is limited by the coherence time, 
and increases as L/logL. For the MISO channel the optimal T/L and B/L (fractional overhead due 
to training and feedback) are asymptotically the same, and tend to zero at the rate l/\ogNt. For the 
MIMO channel the optimal feedback overhead B/L tends to zero faster (as 1/log^ Nt). 
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I. Introduction 

With perfect channel knowledge at the transmitter and receiver, the capacity of a multi-antenna 
system with independent Rayleigh fading increases with the number of antennas [1], [2]. In 
practice, the channel estimate at the receiver will not be perfect, and furthermore, this estimate 
must be quantized before it is relayed back to the transmitter. This has motivated work on the 
performance of feedback schemes with imperfect channel knowledge [3]-[9], and the design and 
performance of limited feedback schemes for Multi-Input Multi-Output (MIMO) and Multi-Input 
Single-Output (MISO) channels (e.g., see [9]-[17] and the recent survey paper [18]). All of the 
previous work on limited feedback assumes perfect channel knowledge at the receiver. Here we 
consider a model that takes into account both imperfect channel estimation at the receiver and 
limited channel state feedback. 

We focus on single-user MISO and MIMO links with rank-one precoders (beamforming), and 
study the achievable rate as a function of overhead for channel estimation and channel state 
feedback. Our objective is to characterize the optimal amount of overhead and the associated 
achievable rate, and to show how those scale with the system size (i.e., as the number of transmit 
and/or receive antennas become large). Motivated by practical systems, a pilot-based scheme for 
channel estimation is assumed. Given a finite coherence time, the number of antennas that can be 
used effectively is limited by the channel estimation error and quantization error associated with 
the transmit beam. We show how the optimal (rate-maximizing) number of transmit antennas 
scales with the system size. 

More specifically, an independent identically distributed (i.i.d.) block Rayleigh fading channel 
is considered in which the channel parameters are stationary within each coherence block, and are 
independent from block to block. The block length L is assumed be constant, and the transmitted 
codewords span many blocks, so that the maximum achievable rate is the ergodic capacity. Each 
coherence block contains T training symbols and D data symbols. Furthermore, we assume that 
after transmission of the training symbols, the transmitter waits for the receiver to relay B bits 
over a feedback channel, which specify a particular beamforming vector. This delay, in addition 
to the T training symbols, must occur within the coherence block, and is therefore counted as 
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part of the packet overheadll] 

We assume that the receiver computes a Minimum Mean Square Error (MMSE) estimate of 
the channel, based on the training symbols, and uses the noisy channel estimate to choose a 
transmit beamforming vector. The Random Vector Quantization (RVQ) scheme in [14], [16], 
[21] is assumed in which the beamformer is selected from a codebook consisting of 2^ random 
vectors, which are independent and isotropically distributed, and known a priori at the transmitter 
and receiver. The associated codebook index is relayed using B bits via a noiseless feedback 
channel to the transmitter. The capacity of this scheme with perfect channel estimation is analyzed 
in [14], [16], [17], [21], [22]. It is shown in [14] that the RVQ codebook is optimal (i.e., 
maximizes the capacity) in the large system limit in which number of transmit antennas A^^ and 
B tend to infinity with fixed ratio B = B/Nt. In [14], [23], RVQ has been observed to give 
essentially optimal performance for systems with small Nt. Furthermore, for the MISO channel 
the performance averaged over the random codebooks can be explicitly computed [16]. 

The capacity with MMSE channel estimates at the receiver (with or without limited feedback) 
is unknown. We derive upper and lower bounds on the capacity with RVQ and limited feedback, 
which are functions of the number of training symbols T and feedback bits B. Given a fixed 
block size, or coherence time L, we then optimize the capacity bounds over B and T. Namely, 
small T leads to a poor channel estimate, which decreases capacity, whereas large T leads to an 
accurate channel estimate, but leaves few symbols in the packet for transmitting the message. 
This trade-off has been studied in [24], [25] for MIMO channels without feedback. Here there 
is also an optimal amount of feedback B, which increases with the training interval T. That is, 
more feedback is needed to quantize more accurate channel estimates. 

We characterize the optimal overhead due to training and feedback in the large system limit as 
the coherence time L and number of transmit antennas Nt both tend to infinity with fixed ratio 
L = L/Nt- For the MIMO channel we also let the number of receiver antennas A^,. — > oo with 
fixed Nf/Nr-. This allows a characterization of the achievable rate as a function of the number 

n 

of feedback bits per degree of freedom [14]o 

'An implicit assumption is that the transmitter cannot learn the channel by detecting a received signal in the reverse direction, 
as in some Time-Division Duplex systems (e.g., see [19]). Although the feedback overhead is counted as part of the coherence 
time, a similar penalty arises with a Frequency-Division Duplex model [20]. 

^See also the tutorial on large random matrix theory [26]. 
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For both MISO and MIMO channels the optimal normalized training T = T/L, which 
maximizes the bounds on capacity, tends to zero at the rate 1 / log Nt. For the MISO channel 
the normalized feedback B = B/L also tends to zero at this rate. Moreover, the training and 
feedback require the same asymptotic overhead. For the MIMO channel the optimal B = B/L 
tends to zero at the rate 1/ log^ Nt. Hence the overhead due to feedback is lower for the MIMO 
channel than for the MISO channel. This is apparently due to the additional degrees of freedom 
at the receiver, which can compensate for the performance loss associated with quantization 
error. 

For both MISO and MIMO channels, the optimal T increases as Nt/ log Nt, and we observe 
that the associated capacity can be achieved by activating only Nt/ log Nt antennas (assuming 
Nt increases linearly with L). Equivalently, for this pilot-based scheme with limited feedback, 
the optimal number of (active) transmit antennas increases as L/logL. Hence the training and 
feedback overhead pose a fundamental limit on the number of antennas that can be effectively 
used. The capacity with optimized overhead grows as log Nt. This is the same as with perfect 
channel knowledge; however, there is a second-order loss term, which increases as log log A^^. 

A similar type of model for optimizing feedback overhead has been previously considered in 
[20]. A key difference is that here the relation between training and channel estimation error is 
explicitly taken into account. The model we present is also closely related to the two-way limited 
feedback system considered in [27], [28] (see also [19]). However, here the feedback channel 
is simply modeled with a fixed rate (i.e., is not the result of an optimization), and reflects the 
likelihood that the forward channel may be quite different from the reverse (feedback) channel. 
Also, the scaling of the optimal overhead and capacity with system size, given a fixed coherence 
time and fixed feedback rate, is not addressed in the preceding references. Similar types of 
overhead and capacity scaling results to those presented here are presented in [29] for a single- 
user wideband multi-carrier channel and in [30] for the cellular downlink based on Orthogonal 
Frequency Division Multiple Access. 

The rest of the paper is organized as follows. Section HI] describes the multi-antenna channel 
model. Bounds on the beamforming capacity for the MISO channel with channel estimation 
and limited feedback are presented in Section |lll] along with a characterization of the optimal 
(capacity-maximizing) training and feedback lengths in the large system limit. Corresponding 
results for the MIMO channel are presented in Section |lVl Numerical results for finite-size MISO 
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and MIMO channels are shown in Section |Vl and conclusions are presented in Section |Vll 

II. System Model 

We consider a point-to-point i.i.d. block fading channel with A^^^ transmit antennas and A^^ 
receive antennas. A rich scattering environment is assumed so that the channel gains correspond- 
ing to different pairs of transmit/receive antennas are independent and Rayleigh distributed. The 
ith Nr X 1 received vector in a particular block is given by 

r(i) = Hvb{i) + n(i) for l<i<D (1) 

where H is an A^^ x Nt channel matrix whose elements are independent, complex Gaussian 
random variables with zero mean and unit variance, i; is an A^^^ x 1 unit-norm beamforming vector, 
b is the transmitted symbol with unit variance, n is additive white Gaussian noise (AWGN) with 
covariance cr^/, and D is the number of data (information) symbols in a block. 

A. Random Vector Quantization 

In prior work [14], we have analyzed the channel capacity with perfect channel knowledge at 
the receiver, but with limited channel knowledge at the transmitter. Specifically, the optimal 
beamformer is quantized at the receiver, and the quantized version is relayed back to the 
transmitter. Given the quantization codebook V = {vi, . . . , V2b}, which is also known a priori 
at the transmitter, and the channel H, the receiver selects the quantized beamforming vector to 
maximize the instantaneous rate, 

v{H) = argmax {log(l + plli/wjll^)} (2) 

where p = l/o"^ is the background signal-to-noise ratio (SNR). The (uncoded) index for the 
rate-maximizing beamforming vector is relayed to the transmitter via an error-free feedback 
link. The capacity depends on the beamforming codebook V and B. With unlimited feedback 
(B oo) the v(H) that maximizes the capacity is the eigenvector of H^H, which corresponds 
to the maximum eigenvalue. 

We will assume that the codebook vectors are independent and isotropically distributed over 
the unit sphere. It is shown in [14], [21] that this RVQ scheme is optimal (i.e., maximizes the 
achievable rate) in the large system limit in which (B, Nt, Nr) oo with fixed normalized 
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feedback B = B/Nt and = Nr/Nf (For the MISO channel = 1.) Furthermore, the 
corresponding capacity grows as log(pA^t), which is the same order-growth as with perfect 
channel knowledge at the transmitter. Although strictly speaking, RVQ is suboptimal for a finite- 
size system, numerical results indicate that the average performance is often indistiguishable from 
the performance with optimized codebooks [14], [23]. 

B. Channel Estimation 

In addition to limited channel information at the transmitter, here we also account for channel 
estimation error at the receiver. Letting H be the estimated channel matrix, the receiver selects 
v{H) assuming that H is the actual channel, i.e., 

v{H) = argmax|log(l + . (3) 

We will assume that the receiver computes the linear MMSE estimate of H given the 
received vectors corresponding to T training vectors. Specifically, the transmitter transmits T 
training symbols 6^(1), •" " ; ^t{T), where the training symbol &t(^) modulates the corresponding 
beamforming vector VT{i)- For the MISO channel the row vector of T received samples is given 
by 

Tt — IiVtBt + riT (4) 

where the channel h is a 1 x A^^ row vector, Vt = [vt{^) ■ ■ -VTiT)], Bt = diag{bT{i)}, and 
riT = [n{l) ■ ■ ■n{T)]. The channel estimate is ^ = VtC, where the T x Nt linear MMSE 
channel estimation filter is given by 

C = argmin£;[||/i-rTC'||^] (5) 
c 

= VrBTiV^Vr + aliyK (6) 



The MSE 



cr, 



1 



2 - E[\\h^ - h^W^] = 1 - —tmceiC^RrC} (7) 



where hi and hi are ith elements of h and h, respectively, and the received covariance matrix 

Rt = Elrlrr] = BtV^VtBI. + all. (8) 

The preceding expressions also apply to the MIMO channel where the estimation is for a 
particular row of H. That is, C is replaced by Cj, which is applied to the ith receiver antenna, 
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and used to estimate the ith row of H. The MSE for each element of H therefore remains the 
same. 

Because the elements of H are assumed to be complex i.i.d. Gaussian random variables, we 
have 

H = H + w (9) 

where the estimate H and the error matrix w are independent, and each contain i.i.d. complex 
Gaussian elements. The elements of w have zero mean and variance cr^, so that H has zero 
mean and covariance (1 — cr^)/. 

The variance cr^ clearly decreases as T increases. Furthermore, since the beamforming vectors 
during training Vr are known a priori to the transmitter and receiver, those can be chosen to 
minimize the MSE. It is shown in [24] that the corresponding set of (unit-norm) beamforming 
vectors achieves the Welch bound with equality. We therefore have that [31] 

VtV:^ = TI if T > Nt, (10) 

V^Vt = I if T<Nt. (11) 

Applying ([6l)- ([TT|) . we obtain the variance of the estimation error 

^l={ ' _ . (12) 

C. Ergodic Capacity 

In what follows, we assume that the forward and feedback links are time-division multiplexed, 
and each block consists of T training symbols, B feedback bits, and D data symbols. Given 
that the size of each block is L symbols, we have the constraint 

L = T + fxB + D (13) 

where /i is a conversion factor, which relates bits to symbols. Our objective is to maximize the 
ergodic capacity, which is the maximum mutual information between b and r, 

max {C = E[maxI{r;b\H,H,v{H))]} (14) 

T,B pt 

subject to (fT3l) . where pb is the probability density function (pdf) for the transmitted symbol 
b, and the expectation is over the channel H, the estimation error w, and the RVQ codebook 
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V. Determining the ergodic capacity of RVQ with channel estimation appears to be intractable, 
so instead we derive upper and lower bounds, which are functions of D, B, and T. We then 
maximize both bounds over {D, B,T}, subject to (fT3l) . 



III. Multi-Input Single-Output Channel 

A. Capacity Bounds 

We first consider a MISO channel with Ix Nt channel vector h. Applying Jensen's inequality, 
we obtain the upper bound on ergodic capacity 

C = E[maxl{b;r\h,v{h),h)] (15) 

Pb 

= E[\og{l + p\hv{h)\^)] (16) 

<log{l +pE[\hv{h)\']) (17) 

where the maximizing pdf is Gaussian, and the expectation is over h, the estimation error w, and 
the random codebook V. Substituting h = h + w into the expectation in (flTI) and simplifying 
gives 

E[\hv{h)\']=al + E[\hv{h)\']. (18) 
Since and u = \hv{h)\'^ /WhW^ are independent [13], [16], we have 

E[\hv{h)\'] = E[\\hf]E[u] = (1 - al)N,E[iy]. (19) 



With RVQ we have 



ly = max {i^,- = \hvAy\\hr} (20) 

l<j<2^ 



where the v/s are i.i.d. with pdf given in [12]. The pdf for v and associated mean can be 
explicitly computed [16]. The mean is given by 

EH = 1-2^5 (2^,^^) (21) 

where the beta function B{m,n) = /q - t)"-i dt for m and n > 0. We can bound E\u] 
as follows. 

Lemma 1: For B >0 and A^^^ > 2, 

^ J - ^ A/-, _ 1 ^ ^ 

> 1-2'^ (23) 
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where 7 = 0.5772 ... is the Euler constant. 

The proof is given in Appendix lAl We note that £'[//] — > 1 — as Nt 00. Substituting 
(fT8])- (|22l) into (flTI) gives an upper bound on capacity. 

To derive a lower bound on capacity, we use the estimation error equation h = h + w to 
write 

r{i) = {hv{h))b{{) + {wv{h))b{i) + n{i) . (24) 

V ' 

z(i) 

Since w and h are independent, it follows that E[z{i)b{i)] = 0. It is shown in [24], [32] that 
replacing z{i) with a zero-mean Gaussian random variable minimizes the mutual information 
/(r; b\h, v{h)) and therefore gives a lower bound on the capacity with channel estimation and 
quantized beamforming. The lower bound is maximized when b{i) has a Gaussian pdf, i.e.. 



C > _E[maxmin/(r; b\h^ 'i^ih))] = E 

Pb Pz 



log 1 



erf 



(25) 



where and denote the pdf and variance for z, respectively. We derive the following lower 
bound on C by applying the inequality in [33]. 
Lemma 2: 



E 



1 



log 1 + — 



>{l-dM,)\og{l + —E[\hv{h)\' 



0"; 



(26) 



where 



d{N,) 



+ 1 



1 

N't 



r 1 



Nt-l 



- r2 1 + 



Nt-l 



'1 + 2~^^ty^^ 



r 1 + 



1 



Nt-l 



(27) 



and the gamma function T{m) = ^'""^e"* dt for m > 0. 

The proof is given in Appendix [At We note that d{Nt) ^ as Nt ^ 00. 

To obtain a lower bound on capacity C, we substitute a"^ = a'^ + a^, (1231) . and (|261)-(|27]) into 
(I25]) . The capacity bounds are summarized as follows. 

Theorem 1: The capacity for a MISO channel with channel estimation variance and 
normalized feedback B satisfies 



Ci<C <Cu for B>0 and A^^ > 2 



(28) 
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where 

G = (1 - diN,)) log (l + P^^^il - 2-^)n)i , (29) 

Cu = log (l + pal + p(l - al)N, (l - 2'^ + l+h^^^_l±^^ ^ . (30) 
The gap between the two bounds tends to zero as p ^ (since both C„ and Ci tend to zero), 
and as Nt ^ CO. With fixed B and the bounds (and the capacity) grow as O(logA^t) as 
Nt — oo. Substituting (fT2)) for gives the bounds as a function of training T. 

Fig. [U compares the bounds in Theorem \T\ with (fT6l) and the tighter lower bound (|25l) . The 
bounds are plotted versus Nt with parameters B/Nt = 1 (one bit per antenna coefficient), 
a1 = 0.15, and SNR p = 5 dB. The tighter bounds, which are analytically intractable, are 
evaluated by Monte Carlo simulation and shown as o's and x's in the figure. The plots show 
that the upper bound in Theorem [T] is close to (fT6l) even for small A^^ while the lower bound in 
the Theorem is close to (l25l) for much larger Nt. Since RVQ requires an exhaustive search over 
the codebook, and the number of entries in the codebook grows exponentially with the number 
of antennas, simulation results are not shown for Nt > 12. As expected, both the upper and 
lower bounds grow at the same rate as Nt increases. 

B. Asymptotic Behavior 

We now study the behavior of the optimal T, B and D, and the capacity as A^t — * oo. With D 
transmitted symbols in an L-symbol packet the effective capacity C = {D/L)C where D = D /Nt 
and L = L/Nt. The associated bounds are C„ = {D/L)Cu and Ci = {D/L)Ci. From Theorem [U 
and (fT2)) . we can write Ci and Cu as functions of {T,B,D} and optimize, i.e., for the lower 
bound we wish to 

max Ci (31) 

T,B,D 

subject to f + fiB + D = L. (32) 

Let {f", B", Z)°} denote the optimal values of T, B, and D, respectively, and let C° denote the 
maximized lower bound on capacity. Similarly, maximizing the upper bound gives the optimal 
parameters {T°, B°, D°} and the corresponding bound C°. These optimized values can be easily 
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computed numerically, and also allow us to characterize the asymptotic behavior of the actual 
capacityl^ 

Theorem 2: Let {f °, B°, D"} = argmax|f 5 5} C subject to As Nt 00, 

log Nt-^L (33) 
B"\ogNt^-L (34) 

and the capacity satisfies 

r - log(piVi) + 2 log log Nt ^ C (35) 
where C is a constant bounded by 

C-log(l + p) <C<C (36) 

where C* = log(P log(2)) - log(/i(l + p-^)) - 2. 

The proof is given in Appendix O Combining (l33l) and (|34l) with (|32|) gives the corresponding 
behavior of the data segment 

£^ = 1 - S{Nt) (37) 

where 5(A/'t) log A/'t/2 ^ 1. 

According to the theorem, as Nt becomes large, to maximize the achievable rate the fraction of 
L devoted to training and feedback tends to zero, in which case the rate increases as \og{pNt) — 
2 loglog A^t. The achievable rate with RVQ and perfect channel estimation is i?[log(l + p||/i|p)l, 
which grows as log(pA^t). Hence the loss of 2 log log A^ is due to imperfect channel estimationc 
Theorem [2] also implies that pB/T 1, i.e., the fraction of the packet devoted to feedback 
is asymptotically the same as that for training. This equal allocation therefore balances the 
reductions in capacity due to estimation and quantization. 

The preceding analysis applies if the beamforming vectors during training are chosen to be unit 
vectors. Namely, the matrix Vr can be taken to be diagonal, which corresponds to transmitting 
the sequence of training symbols over the transmit antennas successively one at a time. Hence 
the fact that the optimal T increases as Nt/ log Nt implies that only Nt/ log Nt antennas are 

^In what follows all logarithms are assumed to be natural. 

''The capacity estimate in the theorem becomes accurate when Nt is large enough so that L/ log Nt is small, in which case 
the loss term 21oglogA'^t is greater than the constant offset (. 
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activated. Since L = L/Nt is fixed, we conclude that as the coherence time L increases, the 
optimal number of transmit antennas should increase as L/logL. The training and feedback 
overhead therefore reduces the number of antennas that can be effectively used by a factor of 
1/logL. 

IV. Multi-Input Multi- Output Channel 

In this section, we let the number of receive antennas A^^ scale with Nt. As for the MISO 
channel, we can bound the capacity with limited training and feedback as follows, 

C<C^ = \og{l + pal + pE[r]]) (38) 

C>Q = il- ciNt)) log f 1 + y-^E[r]]] (39) 



where t] = v{Hy H"^ Hv{H) and 



<Nt) = (40) 



where o"^ is the standard deviation of rj. 

We would like to express the bounds (|38] ) and (|39l ) as functions of T and B. As discussed in 
Section |IIl the variance of the estimation error is again given by (fT2l) . Although it is difficult to 
evaluate E\rj\ explicitly for finite (Nt, Nr, B), it can be computed in the large system limit as 
the parameters tend to infinity with fixed ratios Nr = Nr/Nt and B. Specifically, since H has 
i.i.d. elements with variance 1 — cr^, we have 

^r/ ^ (1 - (T^)7„q (41) 

in the mean square sense, where the asymptotic received signal power with RVQ 7rvq is evaluated 
in [14], and is a function of Nr and B. Therefore 

E[V] = (1 - )7rvqAr, + «:(iV,) (42) 

where K{Nt)/Nt 0. Characterizing K(Nt) explicitly appears to be difficult, but this is not 
needed to prove the following theorem^ Substituting (|42)) and (fT2)) into (l38l) and (l39l) gives 
upper and lower bounds on the capacity, Ci and Cu, respectively, as functions of T and B. 

^We will assume that K,{Nt) is a smooth function of T and B for all A^t, and that K{Nt)/Nt converges to zero uniformly 
over all T and B. 
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Maximizing both bounds over T and B leads to the following theorem, which characterizes the 
asymptotic behavior of the actual capacity. 

Theorem 3: Let {f°,B°,D"} = arg max|r,B,D} C subject to As {Nt,Nr) oo with 
fixed Nr = Nr/Nt, 



S^log^iV,-^^^^ (44) 



and the capacity satisfies 



where 



f° log Nt^L (43) 
P log 

r - log(piV,) + log log ^ e (45) 

r - log(l + p) < ^ < r (46) 



and C = ^og{LNr) - log(l + p-^) - 1. 

The proof is given in Appendix |Dl Combining (|43l) . (I44l) . and (|32l ) gives the corresponding 
behavior of the optimized data segment 

^ = I - ei{Nt) - e2{Nt) (47) 

where ei{Nt) logNt 1 and j^e2{Nt) log' Nt ^ 1. 

Theorem [3] states that the optimal training length for the MIMO channel grows as Nt/ log Nt, 
which is the same as for the MISO channel. Hence as Nt becomes large, only Nt/ log Nt 
transmit antennas should be activated. (All receive antennas are used, since this does not change 
the training overhead.) 

Theorem [3] also states that the capacity with limited training and feedback increases as 
log(pNt) — loglogAi'i. For large Nt the loss in achievable rate due to training and feedback 
therefore increases as loglogA'^t, as opposed to 2 log log A^^ for the MISO channel. This gain is 
due to the smaller MIMO feedback overhead. Namely, because of the additional antennas for 
the MIMO channel, the optimal normalized feedback length tends to zero at the rate 1 / log' Nt, 
as opposed to 1/logA^t for the MISO channel. Note, however, that the training overhead is 
the same since the same training symbols are used to estimate the channel gains to all receive 
antennas simultaneously. Hence the ratio of optimized feedback to training overhead for the 
MIMO channel ^ ^ as l/logA^t. 
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V. Numerical Results 

Fig. [2] shows achievable rates for the MISO channel versus normalized coherence time L = 
L/Nt with different assumptions about channel knowledge at the transmitter and receiver. Three 
curves are shown: (1) the optimized lower bound on capacity C", (2) the capacity assuming 
the receiver knows the channel, but with a quantized beamformer, and (3) the capacity with 
perfect channel knowledge at the transmitter and recevier (optimal beamforming). Parameters 
are A^^^ = 10, p = 5 dB, and p = 1 (BPSK feedback). As expected, the gaps between the curves 
diminishes to zero with increasing coherence time, albeit slowly. This reflects the fact that the 
training and feedback overhead tends to zero as 1/logL. 

Fig. [3] illustrates the sensitivity of the capacity for the MISO channel to different choices for 
training and feedback overhead. The lower bound 0° is plotted versus the fractional overhead 
(T + fiB) / L with different relative allocations T/ (p-B). Parameters are L = 100, Nt = 6, fi = 1, 
and p = 5 dB. The solid line corresponds to optimized overhead T° and B". The capacity is 
zero when T + B = 0, since the estimate is uncorrelated with the channel, and when T + B = L, 
since D = 0. With equal amounts of training and feedback the rate is essentially equal to that 
with optimized parameters. The peak is achieved when {T + B)/L = 0.1. The performance is 
relatively robust to this choice, i.e., small deviations from this value result in a relatively small 
performance loss, although the performance loss increases substantially as the deviations become 
larger. Likewise, the figure also shows that there is a significant performance degradation when 
B deviates significantly from T. 

The optimized training, feedback, and data portions of the packet (normalized by the packet 
length L) versus Nt for the MIMO channel are shown in Fig. HI These values were obtained 
by numerically optimizing the capacity lower bound, and are therefore denoted as B", T°, and 
D" in the figure. System parameters are iV^ = 2, L = 50, p = 1, and p = 5 dB. As predicted 
by Theorem [3l both the optimal T and B decrease to zero, with B decreasing somewhat faster 
than T. The associated capacity lower bound is shown in Fig. [51 Also shown is the capacity 
lower bound with the heuristic choice of parameters B = 1 (one feedback bit per coefficient) 
and T = 1.5 (1.5 training symbols per coefficient). For A^^ = 3, the bound with optimized 
parameters is approximately 10% greater than that with the heuristic choice. Those results are 
compared with the capacity with perfect channel knowledge at both the transmitter and receiver. 
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and the capacity with perfect channel knowledge at the receiver only with B" feedback bits. 
This comparison indicates how much of the loss in achievable rate for the model considered is 
due to channel estimation at the receiver (including associated overhead), and how much is due 
to quantization of the precoding matrix. 

The results show that for Nt = 3, the capacity with perfect channel knowledge at both the 
transmitter and receiver is about 40% larger than the rate with optimized feedback and training 
lengths. Knowing the channel at the receiver achieves most of this gain, largely due to the 
elimination of associated training overhead. Of course, this gap tends to zero as the block size 
L ^ oo. Also shown in the figure for comparison is the capacity lower bound for a MISO channel 
with optimized training and feedback lengths. This is substantially lower than that shown for 
the MIMO channel. From Theorems [2] and [3] the gap between the optimized lower bounds for 
the MISO and MIMO channels increases as loglogA^t. 

Similar to Fig. |3l Fig. [6] shows the capacity lower bound versus total overhead (T + jiB)/L 
for a MIMO channel. The solid line corresponds to optimized parameters with L = 10, A/^^ = 9, 
iVr = 2, /i = 1, and p = 5 dB. The curves are obtained by numerical optimization. For 
the case considered, these results show that the rate achieved with equal portions of training 
and feedback is close to the maximum (corresponding to optimized training and feedback). 
Allocating the overhead according to the asymptotic results in Theorem [3l i.e., taking fiB/T = 
L log 2 / (2 fiNr log Nt), performs marginally better than allocating equal training and feedback. 
The total optimized overhead in this case is {T + B)/L ^ 0.2. The performance degrades when 
B deviates significantly from T (as shown by the curve corresponding to B = 2T). (The three 
curves shown are not extended to {T + B)/L = 1 since the simulation complexity associated 
with RVQ increases exponentially with B.) Compared with the results for the MISO channel in 
Fig. [3l the capacity for the MIMO channel is somewhat more robust with respect to variations 
in overhead. 

VI. Conclusions 

We have presented bounds on the capacity of both MISO and MIMO block Rayleigh fading 
channels with beamforming, assuming limited training and feedback. For a large number of 
transmit antennas, we have characterized the optimal amount of training and feedback as a 
fraction of the packet duration, assuming linear MMSE estimation of the channel, and an RVQ 
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codebook for quantizing the beamforming vector. Our results show that the optimized training 
length for both MISO and MIMO channels increases as Nt/ log Nt, which can be interpreted as 
the optimal number of transmit antennas to activate. The ratio of optimized feedback to training 
overhead tends to one for the MISO channel, but tends to zero as 1/ log Nt for the MIMO channel, 
since additional receiver anteimas improve robustness with respect to quantization error. The loss 
in capacity due to overhead increases as log log Nt for the MIMO channel, and as 2 log log Nt 
for the MISO channel. 

Although the pilot scheme considered is practical, it is most likely suboptimal. That is, in 
the absence of feedback such a pilot-based scheme is strictly suboptimal, although it is nearly 
optimal at high SNRs [24]. Computing the capacity of the block fading channel considered 
with feedback and no channel knowledge at the receiver and transmitter is an open problem. 
Consequently, although the optimal (capacity-maximizing) number of transmit antennas should 
still be limited by the coherence time, the growth rate may differ from the L/ log L growth rate 
shown here for the pilot scheme. 

The model and analysis presented here can be extended in a few different directions. A natural 
generalization of the MIMO beamforming model is to allow a general transmit precoding matrix 
with rank greater than one. The additional overhead should impose a limit on both the number of 
beams and anteimas that can effectively be used. Also, the powers allocated to the training and 
data portions of the coherence block can be optimized in addition to the fraction of overhead 
symbols. Finally, feedback and training overhead becomes especially important in multi-user 
MIMO scenarios, such as the cellular downlink. The optimal overhead scaling with coherence 
time in those scenarios remains to be studied. 
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Appendix 

A. Proof of Lemma [7] 

We need to evaluate (|2TI) . Letting n = 2^, we first bound 

1 \ _ ^nn)T (l + 



nB n, 1 



r 1 + 



1 



r(n + 2) 



> r 1 



iV, - 1 



(48) 
(49) 
(50) 

where we have used B{p,q) = T{p)T{q) /T(p + q), the identity r(A; + 1) = kT{k) for k E N, 
and the inequality r(A; + 1)/T{k + x) > k^^^ for < x < 1 [34]. Since T{x) is convex for 
X G [1,2], for Nt > 2, 



r 1 + 



V (n + l)r(n + l + ^ 

{n + 1)~W^ 



1 



1 + 



r 1 + 



Nt-l 



>r(i) + 



(52) 



iVt - 1 iV< - 1 

where 7 = 0.5772 ... is the Euler constant. Expanding the second factor on the right-hand side 
of (I5TI) in a Taylor series gives 

Nt 1 Nti2Nt - 1) 1 



1 + 



1 \ '^i-l 



= 1 
> 1 



1 1 

Nt-ln ^ 2\{Nt - 1)2^ " 3!(A/'t - 1)^ ^ 
1 



+ ■ 



n{Nt - 1) 

since the magnitude of each term in (|53l) is decreasing. We also expand 



(53) 
(54) 



(2-^)^ = 1 



N. - 1 



(1-2 



Nt-2 
2\{Nt-l) 



B\2 



,(1-2--) 



> 1 
= 1 



(iV,-2)(2iV,-3) 
3!(iV,-l)3 
1 



(1-2 



Nt-l 
1 

Nt-l 



(1 - 2-^) + (1 - 2-^)2 + (1 - 2-^)=^ + 
(2^-1). 



(55) 
(56) 
(57) 



August 17, 2009 



DRAFT 



18 



Substituting dSj), and dSTj) into dSTj) yields 
nB ( n, 1 + 



A^^- 1 



> 2"^ I 1 



7 



niNt - 1) 



2^-1 
iV,- 1 



> 2 



(2^-1 + 7 + 2-^) 



The inequality ^ holds for Nt> 2 and B >0. Therefore 

1 



E[u] = 1-2^5 2^, 1 



< 1 _2-^ + 



_^ , 1 + (7 - 1)2-^ + 2-^^' 



A^.- 1 



To show (1231) . we derive the following upper bound 



nB I n, 1 



1 



N, - 1 



r 1 + 



iV, - 1 



T{n + 1) 



< r 1 



1 



r 1 + 



Nt-l 
1 



2{Nt - 1) 



n + 



1 



N, 



Nt-l 



-B 



(58) 
(59) 

(60) 
(61) 

(62) 

(63) 
(64) 



Nt-l J y ■ 2n{Nt-l] 
The inequality (l63l) is shown in [35]. Since every factor in (|64|) is less than or equal to one, we 
conclude that 



nB ( n, 1 + 



1 



Nt-l 



< T 



(65) 



and combining with (1601) gives the lower bound (1231) . 



B. Proof of Lemma |2] 

Since log ^1 + ^-^j is concave for X E [0, oo) and 



t^oo t 

we can apply the following inequality in [33] 

1 



lim ilog \l + ^t\ = 0, 



E 



log 1 



2E[X] ' ^ 



1 + 



(66) 



(67) 
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Applying the Cauchy-Schwarz inequality, we have 

E\X-E[X]\= / \x-E[X]\fx{x)dx 
Jo 

{x-E[X]yfx{x)dx 



< 



fx{x)dx 



= Vvar[X] 

where fx{-) is the pdf for X. Now set X = Au, where A 
Since A and u are independent, we obtain 



(68) 

(69) 
(70) 



and u ^ \hv{h)\y\\h\\'' 



E\X-E[X]\ ^ Vvar[X] 



2E[X] 



2E[X] 



1 E[A^]E[u^ 



- 1. 



(71) 
(72) 



2\l E^[A] E^[u 

Each element in h is i.i.d. with a complex Gaussian distribution. Hence A is Gamma distributed 



so that 



E[A^] 
W{A] 



(73) 



To evaluate E[u^]/E'^[h'] in (|72|) we first compute 



E[il- 



V] 



(74) 

dv (75) 



(1 - vf n{Nt - 1) (1 - (1 - vf^~Y~^ (1 - vf^-^ 

where /^(■) is the pdf for y, and is given in [16]. Applying the change of variables q = (1— f )^*~^ 
gives 

I ^—^1 (76) 

(77) 



- uY] = n / (1 - g)"-i dg 



Therefore 



var[i/] =£;[i/^] -E'^[i 



nB \ nA + 



nB ( n, 1 



(i-i?H)= 



n^B"^ ( n, 1 + 



(78) 
(79) 

(80) 
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Applying the inequality in [35], we have 

2 



nB I n, 1 + 



Nt-l 



r 1 



< r 

Substituting ([82]) and dSO]) into dSO]) gives 



N, - 1 



Vin + l) 



r n + l + 



Nt-l 



1 1\ ^t-i 



72 



iV*-l 2 



(81) 



(82) 



var z/ 



< r 1 



n + - — - + - - rM 1 + 



r 1 + 



Nt-l 2 
2 



iV, - 1 



Nt-l 

2 



(n + l)"^ (83) 



1 1 \ iVt-l 

^ n{Nt - 1) ^ 2^ 



1 



1 

Nt-l I \ ^ n 



r 1 + 



N-l 



rMi + 



1 



1 \ Nt-l 

Nt-l) \^^n 



Since the second factor in (l64l) is less than or equal to one, we have 



(84) 
(85) 

(86) 



Finally, combining and & gives E \X - E[X]\ / {2E[X]) < d{Nt) in 

which completes the proof. 



C. Proof of Theorem |2] 

We first maximize the upper bound given by 

C -^C 



L \l + p 



-T(l - 2-^)iVj + ^ log(l + r(iVj) 



where 



r{Nt, 



;i + p-^f - f ^ 1 + (7 - 1)2-^ + 2-^^' 



(87) 
(88) 

(89) 



T{l-2-B)Nt {Nt -l){l-2-B) 

Note that r{Nt) ^ as iVj ^ oo for i?, T > 0. Also, the expression for cr^ in (fT2l) with 
T < 1 has been used in (f88l) . since we will show that T ^ as A/^^ ^ oo. We are interested in 
characterizing {f °, D°} as A^^t ^ oo. 
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The Lagrangian is given by 

£ = Cu + X{L-f-i2B-D) (90) 

where A is the Lagrangian multiplier. Setting the partial derivatives of C with respect to D, T, 
B, and A to zero gives the necessary conditions 

log (j^f-i) + log(^) + - 2""^) + log^* + + ^(^*)) -LX = (91) 

^ + ( ^ ) ^'^^'^ - LA - (92) 

^^°^%r^^l^-L,A^O (93) 



2^-1 Vl + ^(^i)/ 55 

L-f-i2B-D = 0. (94) 

Substituting ([921) and dH]) into (|9B gives 

f log Nt + f log (y^T^) + ^ ~ ^'""^ + f log f + f log(l + r{Nt)) 

f \ dr{Nt 



= (L - T - ( 1 + 

We will subsequently show that 



1 + r{Nt) J dT 



l+r{Nt)J dT 
so that (|95] ) implies T — > 0. Hence as Nt oo, (|95] ) becomes 



(95) 



.0, (96) 



ThgNt^ L- fiB. (97) 

Combining (l92l) and (l93l) gives 
where 

C(NA = - ('-^ - ^] (99) 

(log2)(l + r(iV,)) V/i dB df )' ^ ' 

As Nt ^ oo, we claim that C(-^t) 0, which will be proved by showing (|96l) and 

^ ^*w).o. (100) 



l+r(A/'t)y 55 
Hence for large Nt (|981) implies that 

5 = -T + 0(f2). (101) 

August 17, 2009 DRAFT 



22 



Combining (l97]) and (IIOII) . it follows that 

f:\ogNt^L 



11. 



(102) 
(103) 



We now show (|961) and (|100l) . Taking the derivative of r(Ai'j) in (|89l) with respect to T gives 



(9r 



;i+p 



-1\2 



rpo 

B=B° 



(i-2-^s)(r°)2iVi 



(104) 



Combining (11041) with (11021) and (11031) . and applying L'Hopital's rule, we have 







(105) 



(l-2-^S)(T°)2iV, V^'log2, 
where a(A^t) x h{Nt) means that a{Nt)/h{Nt) ^ 1 as iVt ^ oo. Combining (fT04l) . (fTOSl) . and 
the fact that r{Nt) establishes (l96l) . 
We also compute 



95 



'T—'T'o 



[(1 + p-^)^ - f,"](log2)2-^g (7 + 2-^i^^0(log2)2-^S 
f °(1 - 2-^S)2Ar, (i_2-5s)2(Ar, -1) 

/(log2)2-^2iV'' 



(106) 



1 



Similar to (|105l) . as A^j ^ oo, we evaluate 

1 



f -(1 - 2-ss)2Ar, 



1 



^l_2-BS)2(Ar, -1 



L3 log^ 2 

_ 

Llog2 



'log^ 



2-B°Nt ^2 ^(logArt, 







-(logAr,)2 - 



0. 



1 - 2-^S L log 2 

Combining (fT07l) - (fn0l) with (fT06l) implies that dr{Nt)/dB 0, which establishes (flOOl) 
Substituting the optimal parameters in the capacity upper bound (|88l) gives 

C - ^log(pAr,) - :^logf„°-Slog(l - 2-^") 

= log(l + p-i) + S log(l + r(iV,)) 



(107) 

(108) 
(109) 
(110) 



(111) 



August 17, 2009 



DRAFT 



23 



where Q denotes the optimal C„. Taking Nt ^ oo gives 

C: - \og{pNt) + 21oglogiVi ^ log(LMog2) - 2 - log[/i(l + (112) 

We follow similar steps to optimize the lower bound. Instead of (l96l) and (|100l) . we must show 
that 

^^0, ^^0. (113) 

OB ' dT ^ ^ 

We then have 

T°logiVi^L (114) 

BflogNt^^L (115) 

and the optimized lower bound satisfies 

Cr-log(piVi)+21oglogiVi 

(116) 

log(L2 log 2) - 2 - log[/i(l + p-i)] - log(l + p). 
Since the optimized bounds grow with A^^ at the same rate, the capacity must also grow at 
that rate. Furthermore, it can be shown that the bounds on achievable rate increase more slowly 
with Nt if the training and feedback do not satisfy (l33l) and (|34|) . Hence we conclude that the 
parameters that maximize the capacity exhibit this asymptotic behavior. 

D. Proof of Theorem |2] 

Similar to the proof of Theorem [2] in Appendix O we first optimize the upper bound given 

by 

C„ = £log (^^^-^f7„qiVi^ + ^log(l + s{Nt)) (117) 

where 

, , (1 +p-i)2 + (l + p-i)fi:(Ar<) -f 

sm = - — ' ^ — AT — ' (118) 

and we have substituted cr^ = 1 — T/(l + p^^), corresponding to T < 1, since we will show 
that the optimal normalized training length T° — > as ^ cxd. 
The Lagrangian for this optimization problem is given by 

£ = Cu + X{L-f-pB-D) (119) 
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where A is the Lagrange multiplier. The first-order necessary conditions are 



log 



P 



1 + p- 



- + log(T) + log(7rvq) + log Nt + log(l + s(iV,)) - LA = 



D 



D 

\ l + s{Nt 



ds{Nt[ 



rvq 



dT 

D \ ds{Nt 



^7„q; dB \l + s{Nt)J dB 
L-f-^iB-D = 0. 

Substituting (11211) and (11231) into (11201) gives 



LA = 

LfiX = 



(120) 

(121) 

(122) 
(123) 



TlogA/'t + Tlog 



1 + P 



-1 



Tl0g(7rvq) + ri0g(f ) + f l0g(l + s(iV,)) 



1 + s{Nt) J dT 



(124) 



Using an analogous argument to that used to show (1961 ) in Appendix O we can show that 



l+s{Nt) I dT 



dsjNt) 



0. Taking Nt ^ oo therefore gives 

f°logiVi-L^O, 



(125) 



assuming that i?° 0, which will be proved next. 

Substituting (|121l) into (|122l) to eliminate A and rearranging gives 



7i 



rvq 



rvq 



dB 



T 



1 + 



T 



fds{Nt) lds{Nt 
1 + s{Nt) \~df Jl dB 



(126) 



Similar to the proof in Appendix O we can show that ( j 

-1 



T 



dsjNt) 



7rvq 



^7; 



rvq 



dB 



+s{Nt) J dB 

0. 



0, so that 



(127) 



For < B < 5* it is shown in [14, Theorem 3] that 7rvq satisfies (after some rearrangement) 

1 



\ g-7rvq/Afr 



e 



where B* is given by 

B* 



log 2 



A^^ log( ViV^) - Nr log(l + 



(128) 



(129) 



We can therefore write —'jnq/Nr = W{—^2 ^Z^' ), where W{x) is the Lambert-VT function. 
It is straightforward to show that 



7: 



rvq 



d'Jrvq\ ^ _ ( <9[log7i.vq]\ ^ _ 7rvq - Nr 



dB 



dB 



log 2 



(130) 
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Hence from (11271) . 7rvq/^r I as Nt ^ oo, which implies that B ^ 0. 

To determine the first-order rate at which i? ^ 0, we combine (|126l) and (11301) to write 



7rvq T _ l0g2 



T + 0{T^) (131) 



Nr fiNr 

The behavior of 7rvq for small B (equivalently, 'jrvq/Nr close to one) can be determined by 
expanding W{x) around x = —e~^. Such an expansion is given in [36], which we rewrite as 

1 . 11 

where (b = 2(1 - 2"^/^'-) = (21og2)(i?/iV^) + 0{B'^) for small B. Hence we have 



7„q = Nr(l + VcE+kB + ^CBVcE+ 0{Cf) ] (132) 



^ - 1 = + OiCs) = \r-^^ + 0{B). (133) 



Combining this with (11311) gives 



and substituting for T from (11251) . we conclude that the feedback overhead that maximizes the 
upper bound on achievable rate satisfies 

Plog2^ 

Substituting for the optimized T° and B° in Cu gives 



K = . , +0{-^] (135) 



C° - log(piVi) + loglogiVi log ( ) . (136) 



Ap + 1) 

We can apply the same techniques to the lower bound on achievable rate to determine the 
behavior of the optimal parameters. In addition, we must show that 

y ^ — > 0, — > (137) 
dT ' OB ^ ^ 

where c{Nt) is given by (|40|) . As A^^ — oo with fixed N.^ and B, rj converges in the mean square 

sense to a deterministic value [14]. Hence cr^ 0, so that c{Nt) for all T and B. The 

limits (|137|) then follow since rj is asymptotically well-behaved (bounded and smooth) and the 

mean and variance clearly converge uniformly over all T and B. 
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The training and feedback overhead that maximize the lower bound on achievable rate therefore 
satisfy 

f° log iV,-^L (138) 



L2 log 2 



and substituting into the expression for C° gives 



Q - logipNt) + loglogiV, log ( ) - log(l + p). (140) 



Furthermore, if the training and feedback overhead do not satisfy (1431) and (|44)) . then it can be 
shown that the bounds on achievable rate cannot increase as fast with Nt. Hence this establishes 
the theorem. 
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Fig. 1. The capacity bounds in Theorem [T] (bits/channel use) versus number of transmit antennas. 
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Fig. 2. Achievable rate versus normalized packet length L. 
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Fig. 3. Lower bound on capacity versus normalized training and feedback (T + iJ.B)/L with different allocations T/{fiB). 
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Fig. 4. Optimized training and feedbaclc overhead, and fraction of data symbols {T° / L, 3° / L, Df /L} versus number of 
transmit antennas Nt. 
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Fig. 5. Achievable rate for MIMO channel versus number of transmit antennas Nt with different assumptions about channel 
knowledge at the receiver and transmitter. Also shown is the optimized capacity lower bound for the corresponding MISO 
channel 



August 17, 2009 



DRAFT 



FIGURES 



34 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 



(T + nB)/L 

Fig. 6. Lower bound on beamforming capacity for MIMO channel versus normalized training and feedback (T + fiB)/L. 
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